Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 416 | 428 |
| Missing cells (%) | 7.8% | 8.0% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 7 | 7 |
| Dataset A | Dataset B | |
|---|---|---|
Name has a high cardinality: 446 distinct values | Name has a high cardinality: 446 distinct values | High Cardinality |
Ticket has a high cardinality: 379 distinct values | Ticket has a high cardinality: 376 distinct values | High Cardinality |
Cabin has a high cardinality: 98 distinct values | Cabin has a high cardinality: 81 distinct values | High Cardinality |
Fare is highly overall correlated with Pclass | Fare is highly overall correlated with Pclass | High Correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Pclass is highly overall correlated with Fare | Pclass is highly overall correlated with Fare | High Correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Age has 82 (18.4%) missing values | Age has 82 (18.4%) missing values | Missing |
Cabin has 333 (74.7%) missing values | Cabin has 345 (77.4%) missing values | Missing |
Name is uniformly distributed | Name is uniformly distributed | Uniform |
Ticket is uniformly distributed | Ticket is uniformly distributed | Uniform |
Cabin is uniformly distributed | Cabin is uniformly distributed | Uniform |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 295 (66.1%) zeros | SibSp has 294 (65.9%) zeros | Zeros |
Parch has 336 (75.3%) zeros | Parch has 333 (74.7%) zeros | Zeros |
Fare has 6 (1.3%) zeros | Fare has 10 (2.2%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-01-30 17:09:49.374474 | 2023-01-30 17:09:54.506919 |
| Analysis finished | 2023-01-30 17:09:54.503590 | 2023-01-30 17:09:58.545820 |
| Duration | 5.13 seconds | 4.04 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 456.78924 | 459.75112 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 890 | 889 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 38.5 | 50.25 |
| Q1 | 231.75 | 249.25 |
| median | 460 | 467.5 |
| Q3 | 685.25 | 671.75 |
| 95-th percentile | 854.25 | 853 |
| Maximum | 890 | 889 |
| Range | 889 | 888 |
| Interquartile range (IQR) | 453.5 | 422.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 261.11357 | 255.6025 |
| Coefficient of variation (CV) | 0.57162812 | 0.5559584 |
| Kurtosis | -1.1928291 | -1.1677702 |
| Mean | 456.78924 | 459.75112 |
| Median Absolute Deviation (MAD) | 226.5 | 215 |
| Skewness | -0.055220217 | -0.066689484 |
| Sum | 203728 | 205049 |
| Variance | 68180.297 | 65332.637 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 428 | 1 | 0.2% |
| 748 | 1 | 0.2% |
| 790 | 1 | 0.2% |
| 91 | 1 | 0.2% |
| 556 | 1 | 0.2% |
| 342 | 1 | 0.2% |
| 862 | 1 | 0.2% |
| 258 | 1 | 0.2% |
| 228 | 1 | 0.2% |
| 495 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 676 | 1 | 0.2% |
| 645 | 1 | 0.2% |
| 62 | 1 | 0.2% |
| 854 | 1 | 0.2% |
| 357 | 1 | 0.2% |
| 319 | 1 | 0.2% |
| 358 | 1 | 0.2% |
| 889 | 1 | 0.2% |
| 97 | 1 | 0.2% |
| 706 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 13 | 1 | |
| 15 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 19 | 1 | |
| 21 | 1 | |
| 25 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 19 | 1 | |
| 21 | 1 | |
| 25 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 13 | 1 | |
| 15 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 0 | 1 |
| 3rd row | 0 | 0 |
| 4th row | 1 | 0 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 277 | |
| 1 | 169 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 2 | 3 |
| 2nd row | 2 | 3 |
| 3rd row | 3 | 3 |
| 4th row | 3 | 2 |
| 5th row | 3 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 116 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 116 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 116 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 116 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 116 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 116 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 108 | |
| 2 | 87 | 19.5% |
Name
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") | 1 |
|---|---|
| Sinkkonen, Miss. Anna | 1 |
| Guggenheim, Mr. Benjamin | 1 |
| Christmann, Mr. Emil | 1 |
| Wright, Mr. George | 1 |
| Other values (441) |
| Edvardsson, Mr. Gustaf Hjalmar | 1 |
|---|---|
| Baclini, Miss. Eugenie | 1 |
| Icard, Miss. Amelie | 1 |
| Lines, Miss. Mary Conover | 1 |
| Bowerman, Miss. Elsie Edith | 1 |
| Other values (441) |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 67 |
| Median length | 49.5 | 49 |
| Mean length | 27.329596 | 27.159193 |
| Min length | 13 | 13 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 12189 | 12113 |
| Distinct characters | 60 | 59 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") | Edvardsson, Mr. Gustaf Hjalmar |
| 2nd row | Gill, Mr. John William | Landergren, Miss. Aurora Adelia |
| 3rd row | Hakkarainen, Mr. Pekka Pietari | Strom, Mrs. Wilhelm (Elna Matilda Persson) |
| 4th row | Heikkinen, Miss. Laina | Harris, Mr. Walter |
| 5th row | Cribb, Mr. John Hatfield | Andrews, Mr. Thomas Jr |
Common Values
| Value | Count | Frequency (%) |
| Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") | 1 | 0.2% |
| Sinkkonen, Miss. Anna | 1 | 0.2% |
| Guggenheim, Mr. Benjamin | 1 | 0.2% |
| Christmann, Mr. Emil | 1 | 0.2% |
| Wright, Mr. George | 1 | 0.2% |
| Fortune, Miss. Alice Elizabeth | 1 | 0.2% |
| Giles, Mr. Frederick Edward | 1 | 0.2% |
| Cherry, Miss. Gladys | 1 | 0.2% |
| Lovell, Mr. John Hall ("Henry") | 1 | 0.2% |
| Stanley, Mr. Edward Roland | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| Edvardsson, Mr. Gustaf Hjalmar | 1 | 0.2% |
| Baclini, Miss. Eugenie | 1 | 0.2% |
| Icard, Miss. Amelie | 1 | 0.2% |
| Lines, Miss. Mary Conover | 1 | 0.2% |
| Bowerman, Miss. Elsie Edith | 1 | 0.2% |
| Wick, Miss. Mary Natalie | 1 | 0.2% |
| Funk, Miss. Annie Clemmer | 1 | 0.2% |
| Johnston, Miss. Catherine Helen "Carrie" | 1 | 0.2% |
| Goldschmidt, Mr. George B | 1 | 0.2% |
| Morley, Mr. Henry Samuel ("Mr Henry Marshall") | 1 | 0.2% |
| Other values (436) | 436 |
Length
Common Values (Plot)
Dataset A
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Dataset B
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| mr | 260 | 14.2% |
| miss | 97 | 5.3% |
| mrs | 64 | 3.5% |
| william | 29 | 1.6% |
| john | 24 | 1.3% |
| master | 20 | 1.1% |
| henry | 18 | 1.0% |
| charles | 14 | 0.8% |
| george | 14 | 0.8% |
| thomas | 12 | 0.7% |
| Other values (896) | 1281 |
| Value | Count | Frequency (%) |
| mr | 257 | 14.1% |
| miss | 91 | 5.0% |
| mrs | 71 | 3.9% |
| william | 35 | 1.9% |
| john | 22 | 1.2% |
| master | 20 | 1.1% |
| henry | 16 | 0.9% |
| thomas | 13 | 0.7% |
| george | 13 | 0.7% |
| edward | 12 | 0.7% |
| Other values (872) | 1279 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1389 | 11.4% | |
| r | 1000 | 8.2% |
| e | 880 | 7.2% |
| a | 827 | 6.8% |
| n | 688 | 5.6% |
| i | 676 | 5.5% |
| s | 637 | 5.2% |
| M | 577 | 4.7% |
| l | 551 | 4.5% |
| o | 519 | 4.3% |
| Other values (50) | 4445 |
| Value | Count | Frequency (%) |
| 1384 | 11.4% | |
| r | 994 | 8.2% |
| a | 842 | 7.0% |
| e | 839 | 6.9% |
| i | 692 | 5.7% |
| s | 655 | 5.4% |
| n | 649 | 5.4% |
| M | 571 | 4.7% |
| l | 538 | 4.4% |
| o | 512 | 4.2% |
| Other values (49) | 4437 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7856 | |
| Uppercase Letter | 1841 | 15.1% |
| Space Separator | 1389 | 11.4% |
| Other Punctuation | 954 | 7.8% |
| Close Punctuation | 71 | 0.6% |
| Open Punctuation | 71 | 0.6% |
| Dash Punctuation | 7 | 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7785 | |
| Uppercase Letter | 1843 | 15.2% |
| Space Separator | 1384 | 11.4% |
| Other Punctuation | 942 | 7.8% |
| Close Punctuation | 77 | 0.6% |
| Open Punctuation | 77 | 0.6% |
| Dash Punctuation | 5 | < 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1389 |
| Value | Count | Frequency (%) |
| 1384 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 1000 | |
| e | 880 | |
| a | 827 | |
| n | 688 | |
| i | 676 | |
| s | 637 | |
| l | 551 | 7.0% |
| o | 519 | 6.6% |
| t | 327 | 4.2% |
| h | 282 | 3.6% |
| Other values (16) | 1469 |
| Value | Count | Frequency (%) |
| r | 994 | |
| a | 842 | |
| e | 839 | |
| i | 692 | |
| s | 655 | |
| n | 649 | |
| l | 538 | 6.9% |
| o | 512 | 6.6% |
| t | 323 | 4.1% |
| h | 260 | 3.3% |
| Other values (16) | 1481 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 577 | |
| A | 127 | 6.9% |
| J | 114 | 6.2% |
| S | 90 | 4.9% |
| C | 88 | 4.8% |
| H | 88 | 4.8% |
| E | 87 | 4.7% |
| B | 71 | 3.9% |
| W | 71 | 3.9% |
| L | 64 | 3.5% |
| Other values (15) | 464 |
| Value | Count | Frequency (%) |
| M | 571 | |
| A | 124 | 6.7% |
| J | 111 | 6.0% |
| H | 104 | 5.6% |
| E | 92 | 5.0% |
| C | 87 | 4.7% |
| S | 79 | 4.3% |
| W | 77 | 4.2% |
| B | 72 | 3.9% |
| R | 61 | 3.3% |
| Other values (15) | 465 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 54 | 5.7% |
| ' | 6 | 0.6% |
| / | 1 | 0.1% |
| Value | Count | Frequency (%) |
| , | 446 | |
| . | 446 | |
| " | 46 | 4.9% |
| ' | 4 | 0.4% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 71 |
| Value | Count | Frequency (%) |
| ) | 77 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 71 |
| Value | Count | Frequency (%) |
| ( | 77 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 7 |
| Value | Count | Frequency (%) |
| - | 5 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9697 | |
| Common | 2492 | 20.4% |
| Value | Count | Frequency (%) |
| Latin | 9628 | |
| Common | 2485 | 20.5% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1389 | ||
| . | 447 | 17.9% |
| , | 446 | 17.9% |
| ) | 71 | 2.8% |
| ( | 71 | 2.8% |
| " | 54 | 2.2% |
| - | 7 | 0.3% |
| ' | 6 | 0.2% |
| / | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1384 | ||
| , | 446 | 17.9% |
| . | 446 | 17.9% |
| ) | 77 | 3.1% |
| ( | 77 | 3.1% |
| " | 46 | 1.9% |
| - | 5 | 0.2% |
| ' | 4 | 0.2% |
Latin
| Value | Count | Frequency (%) |
| r | 1000 | 10.3% |
| e | 880 | 9.1% |
| a | 827 | 8.5% |
| n | 688 | 7.1% |
| i | 676 | 7.0% |
| s | 637 | 6.6% |
| M | 577 | 6.0% |
| l | 551 | 5.7% |
| o | 519 | 5.4% |
| t | 327 | 3.4% |
| Other values (41) | 3015 |
| Value | Count | Frequency (%) |
| r | 994 | 10.3% |
| a | 842 | 8.7% |
| e | 839 | 8.7% |
| i | 692 | 7.2% |
| s | 655 | 6.8% |
| n | 649 | 6.7% |
| M | 571 | 5.9% |
| l | 538 | 5.6% |
| o | 512 | 5.3% |
| t | 323 | 3.4% |
| Other values (41) | 3013 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12189 |
| Value | Count | Frequency (%) |
| ASCII | 12113 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1389 | 11.4% | |
| r | 1000 | 8.2% |
| e | 880 | 7.2% |
| a | 827 | 6.8% |
| n | 688 | 5.6% |
| i | 676 | 5.5% |
| s | 637 | 5.2% |
| M | 577 | 4.7% |
| l | 551 | 4.5% |
| o | 519 | 4.3% |
| Other values (50) | 4445 |
| Value | Count | Frequency (%) |
| 1384 | 11.4% | |
| r | 994 | 8.2% |
| a | 842 | 7.0% |
| e | 839 | 6.9% |
| i | 692 | 5.7% |
| s | 655 | 5.4% |
| n | 649 | 5.4% |
| M | 571 | 4.7% |
| l | 538 | 4.4% |
| o | 512 | 4.2% |
| Other values (49) | 4437 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7174888 | 4.7219731 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2104 | 2106 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | male |
| 2nd row | male | female |
| 3rd row | male | female |
| 4th row | female | male |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 286 | |
| female | 160 |
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 286 | |
| female | 160 |
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2104 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2106 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2104 |
| Value | Count | Frequency (%) |
| Latin | 2106 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2104 |
| Value | Count | Frequency (%) |
| ASCII | 2106 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 76 | 73 |
| Distinct (%) | 20.9% | 20.1% |
| Missing | 82 | 82 |
| Missing (%) | 18.4% | 18.4% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.88783 | 29.081264 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.75 |
| Maximum | 80 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.75 |
| 5-th percentile | 4.15 | 5.15 |
| Q1 | 20 | 19.75 |
| median | 28 | 27.5 |
| Q3 | 39 | 38 |
| 95-th percentile | 59.7 | 54.85 |
| Maximum | 80 | 71 |
| Range | 79.58 | 70.25 |
| Interquartile range (IQR) | 19 | 18.25 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 15.119665 | 14.321352 |
| Coefficient of variation (CV) | 0.50588032 | 0.49245975 |
| Kurtosis | 0.31051106 | 0.029531589 |
| Mean | 29.88783 | 29.081264 |
| Median Absolute Deviation (MAD) | 9 | 8.5 |
| Skewness | 0.48654229 | 0.4206788 |
| Sum | 10879.17 | 10585.58 |
| Variance | 228.60426 | 205.10112 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 19 | 18 | 4.0% |
| 30 | 15 | 3.4% |
| 28 | 14 | 3.1% |
| 18 | 14 | 3.1% |
| 22 | 12 | 2.7% |
| 24 | 12 | 2.7% |
| 21 | 12 | 2.7% |
| 31 | 11 | 2.5% |
| 39 | 11 | 2.5% |
| 36 | 11 | 2.5% |
| Other values (66) | 234 | |
| (Missing) | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 19 | 16 | 3.6% |
| 25 | 15 | 3.4% |
| 22 | 15 | 3.4% |
| 18 | 14 | 3.1% |
| 24 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 32 | 12 | 2.7% |
| 30 | 11 | 2.5% |
| 16 | 11 | 2.5% |
| 28 | 11 | 2.5% |
| Other values (63) | 232 | |
| (Missing) | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 1 | 0.2% |
| 4 | 5 | |
| 5 | 3 | |
| 6 | 3 |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 3 | |
| 7 | 3 | |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 3 | |
| 7 | 3 | |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 1 | 0.2% |
| 4 | 5 | |
| 5 | 3 | |
| 6 | 3 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.54484305 | 0.54932735 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 295 | 294 |
| Zeros (%) | 66.1% | 65.9% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0898955 | 1.1242124 |
| Coefficient of variation (CV) | 2.0003844 | 2.0465254 |
| Kurtosis | 16.405745 | 18.251242 |
| Mean | 0.54484305 | 0.54932735 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.4990617 | 3.7370127 |
| Sum | 243 | 245 |
| Variance | 1.1878722 | 1.2638535 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 295 | |
| 1 | 114 | 25.6% |
| 2 | 12 | 2.7% |
| 4 | 11 | 2.5% |
| 3 | 9 | 2.0% |
| 8 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 294 | |
| 1 | 115 | 25.8% |
| 2 | 16 | 3.6% |
| 4 | 11 | 2.5% |
| 8 | 4 | 0.9% |
| 3 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 295 | |
| 1 | 114 | 25.6% |
| 2 | 12 | 2.7% |
| 3 | 9 | 2.0% |
| 4 | 11 | 2.5% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 294 | |
| 1 | 115 | 25.8% |
| 2 | 16 | 3.6% |
| 3 | 4 | 0.9% |
| 4 | 11 | 2.5% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 294 | |
| 1 | 115 | 25.8% |
| 2 | 16 | 3.6% |
| 3 | 4 | 0.9% |
| 4 | 11 | 2.5% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 295 | |
| 1 | 114 | 25.6% |
| 2 | 12 | 2.7% |
| 3 | 9 | 2.0% |
| 4 | 11 | 2.5% |
| 5 | 2 | 0.4% |
| 8 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.40134529 | 0.39686099 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 5 |
| Zeros | 336 | 333 |
| Zeros (%) | 75.3% | 74.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 5 |
| Range | 5 | 5 |
| Interquartile range (IQR) | 0 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.83081007 | 0.80274194 |
| Coefficient of variation (CV) | 2.0700631 | 2.0227283 |
| Kurtosis | 7.9986247 | 8.3339514 |
| Mean | 0.40134529 | 0.39686099 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.5835274 | 2.564672 |
| Sum | 179 | 177 |
| Variance | 0.69024538 | 0.64439462 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 336 | |
| 1 | 60 | 13.5% |
| 2 | 40 | 9.0% |
| 3 | 4 | 0.9% |
| 5 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 333 | |
| 1 | 64 | 14.3% |
| 2 | 42 | 9.4% |
| 5 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 336 | |
| 1 | 60 | 13.5% |
| 2 | 40 | 9.0% |
| 3 | 4 | 0.9% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 333 | |
| 1 | 64 | 14.3% |
| 2 | 42 | 9.4% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 333 | |
| 1 | 64 | 14.3% |
| 2 | 42 | 9.4% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 336 | |
| 1 | 60 | 13.5% |
| 2 | 40 | 9.0% |
| 3 | 4 | 0.9% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
Ticket
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 379 | 376 |
| Distinct (%) | 85.0% | 84.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 347082 | 6 |
|---|---|
| 19950 | 4 |
| 3101295 | 4 |
| W./C. 6608 | 4 |
| 13502 | 3 |
| Other values (374) |
| 347082 | 5 |
|---|---|
| 3101295 | 4 |
| 1601 | 4 |
| CA. 2343 | 4 |
| LINE | 3 |
| Other values (371) |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6860987 | 6.7511211 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2982 | 3011 |
| Distinct characters | 32 | 32 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 330 | 322 ? |
| Unique (%) | 74.0% | 72.2% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 250655 | 349912 |
| 2nd row | 233866 | C 7077 |
| 3rd row | STON/O2. 3101279 | 347054 |
| 4th row | STON/O2. 3101282 | W/C 14208 |
| 5th row | 371362 | 112050 |
Common Values
| Value | Count | Frequency (%) |
| 347082 | 6 | 1.3% |
| 19950 | 4 | 0.9% |
| 3101295 | 4 | 0.9% |
| W./C. 6608 | 4 | 0.9% |
| 13502 | 3 | 0.7% |
| 347742 | 3 | 0.7% |
| 347088 | 3 | 0.7% |
| SC/Paris 2123 | 3 | 0.7% |
| 363291 | 3 | 0.7% |
| CA. 2343 | 3 | 0.7% |
| Other values (369) | 410 |
| Value | Count | Frequency (%) |
| 347082 | 5 | 1.1% |
| 3101295 | 4 | 0.9% |
| 1601 | 4 | 0.9% |
| CA. 2343 | 4 | 0.9% |
| LINE | 3 | 0.7% |
| W./C. 6608 | 3 | 0.7% |
| 110413 | 3 | 0.7% |
| 382652 | 3 | 0.7% |
| 17421 | 3 | 0.7% |
| 345773 | 3 | 0.7% |
| Other values (366) | 411 |
Length
Common Values (Plot)
Dataset A
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Dataset B
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| pc | 27 | 4.9% |
| c.a | 15 | 2.7% |
| a/5 | 10 | 1.8% |
| sc/paris | 7 | 1.3% |
| 347082 | 6 | 1.1% |
| ca | 6 | 1.1% |
| w./c | 5 | 0.9% |
| 3101295 | 4 | 0.7% |
| 6608 | 4 | 0.7% |
| soton/oq | 4 | 0.7% |
| Other values (394) | 468 |
| Value | Count | Frequency (%) |
| pc | 25 | 4.4% |
| c.a | 13 | 2.3% |
| a/5 | 9 | 1.6% |
| ston/o | 7 | 1.2% |
| 2 | 7 | 1.2% |
| ca | 6 | 1.1% |
| 347082 | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| w./c | 5 | 0.9% |
| soton/o.q | 4 | 0.7% |
| Other values (395) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 345 | |
| 2 | 289 | |
| 7 | 243 | |
| 4 | 237 | |
| 0 | 206 | 6.9% |
| 5 | 203 | 6.8% |
| 6 | 201 | 6.7% |
| 9 | 172 | 5.8% |
| 8 | 141 | 4.7% |
| Other values (22) | 572 |
| Value | Count | Frequency (%) |
| 3 | 380 | |
| 1 | 340 | |
| 2 | 297 | |
| 7 | 259 | |
| 4 | 240 | |
| 6 | 210 | 7.0% |
| 0 | 200 | 6.6% |
| 5 | 179 | 5.9% |
| 9 | 169 | 5.6% |
| 8 | 138 | 4.6% |
| Other values (22) | 599 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2410 | |
| Uppercase Letter | 305 | 10.2% |
| Other Punctuation | 141 | 4.7% |
| Space Separator | 110 | 3.7% |
| Lowercase Letter | 16 | 0.5% |
| Value | Count | Frequency (%) |
| Decimal Number | 2412 | |
| Uppercase Letter | 329 | 10.9% |
| Other Punctuation | 146 | 4.8% |
| Space Separator | 116 | 3.9% |
| Lowercase Letter | 8 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 345 | |
| 2 | 289 | |
| 7 | 243 | |
| 4 | 237 | |
| 0 | 206 | |
| 5 | 203 | |
| 6 | 201 | |
| 9 | 172 | |
| 8 | 141 | 5.9% |
| Value | Count | Frequency (%) |
| 3 | 380 | |
| 1 | 340 | |
| 2 | 297 | |
| 7 | 259 | |
| 4 | 240 | |
| 6 | 210 | |
| 0 | 200 | |
| 5 | 179 | |
| 9 | 169 | |
| 8 | 138 | 5.7% |
Space Separator
| Value | Count | Frequency (%) |
| 110 |
| Value | Count | Frequency (%) |
| 116 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 92 | |
| / | 49 |
| Value | Count | Frequency (%) |
| . | 92 | |
| / | 54 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 73 | |
| P | 49 | |
| A | 43 | |
| O | 40 | |
| S | 35 | |
| N | 16 | 5.2% |
| T | 14 | 4.6% |
| Q | 7 | 2.3% |
| W | 7 | 2.3% |
| I | 7 | 2.3% |
| Other values (5) | 14 | 4.6% |
| Value | Count | Frequency (%) |
| C | 71 | |
| O | 53 | |
| P | 42 | |
| A | 41 | |
| S | 38 | |
| N | 23 | 7.0% |
| T | 20 | 6.1% |
| W | 10 | 3.0% |
| I | 7 | 2.1% |
| Q | 7 | 2.1% |
| Other values (5) | 17 | 5.2% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 4 | |
| r | 4 | |
| i | 4 | |
| s | 4 |
| Value | Count | Frequency (%) |
| a | 2 | |
| r | 2 | |
| i | 2 | |
| s | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2661 | |
| Latin | 321 | 10.8% |
| Value | Count | Frequency (%) |
| Common | 2674 | |
| Latin | 337 | 11.2% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 345 | |
| 2 | 289 | |
| 7 | 243 | |
| 4 | 237 | |
| 0 | 206 | |
| 5 | 203 | |
| 6 | 201 | |
| 9 | 172 | |
| 8 | 141 | 5.3% |
| Other values (3) | 251 |
| Value | Count | Frequency (%) |
| 3 | 380 | |
| 1 | 340 | |
| 2 | 297 | |
| 7 | 259 | |
| 4 | 240 | |
| 6 | 210 | |
| 0 | 200 | |
| 5 | 179 | |
| 9 | 169 | |
| 8 | 138 | 5.2% |
| Other values (3) | 262 |
Latin
| Value | Count | Frequency (%) |
| C | 73 | |
| P | 49 | |
| A | 43 | |
| O | 40 | |
| S | 35 | |
| N | 16 | 5.0% |
| T | 14 | 4.4% |
| Q | 7 | 2.2% |
| W | 7 | 2.2% |
| I | 7 | 2.2% |
| Other values (9) | 30 |
| Value | Count | Frequency (%) |
| C | 71 | |
| O | 53 | |
| P | 42 | |
| A | 41 | |
| S | 38 | |
| N | 23 | 6.8% |
| T | 20 | 5.9% |
| W | 10 | 3.0% |
| I | 7 | 2.1% |
| Q | 7 | 2.1% |
| Other values (9) | 25 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2982 |
| Value | Count | Frequency (%) |
| ASCII | 3011 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 345 | |
| 2 | 289 | |
| 7 | 243 | |
| 4 | 237 | |
| 0 | 206 | 6.9% |
| 5 | 203 | 6.8% |
| 6 | 201 | 6.7% |
| 9 | 172 | 5.8% |
| 8 | 141 | 4.7% |
| Other values (22) | 572 |
| Value | Count | Frequency (%) |
| 3 | 380 | |
| 1 | 340 | |
| 2 | 297 | |
| 7 | 259 | |
| 4 | 240 | |
| 6 | 210 | 7.0% |
| 0 | 200 | 6.6% |
| 5 | 179 | 5.9% |
| 9 | 169 | 5.6% |
| 8 | 138 | 4.6% |
| Other values (22) | 599 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 176 | 178 |
| Distinct (%) | 39.5% | 39.9% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.955539 | 31.655884 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 263 | 263 |
| Zeros | 6 | 10 |
| Zeros (%) | 1.3% | 2.2% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.162525 |
| Q1 | 7.925 | 7.8958 |
| median | 14.5 | 14.47915 |
| Q3 | 31.275 | 31.275 |
| 95-th percentile | 108.28125 | 112.67708 |
| Maximum | 263 | 263 |
| Range | 263 | 263 |
| Interquartile range (IQR) | 23.35 | 23.3792 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 43.157289 | 43.17571 |
| Coefficient of variation (CV) | 1.3505417 | 1.3639079 |
| Kurtosis | 12.330276 | 11.1608 |
| Mean | 31.955539 | 31.655884 |
| Median Absolute Deviation (MAD) | 7.0042 | 7.22915 |
| Skewness | 3.243237 | 3.1029131 |
| Sum | 14252.171 | 14118.524 |
| Variance | 1862.5516 | 1864.1419 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 8.05 | 22 | 4.9% |
| 13 | 20 | 4.5% |
| 7.8958 | 19 | 4.3% |
| 7.75 | 18 | 4.0% |
| 26 | 14 | 3.1% |
| 10.5 | 12 | 2.7% |
| 7.8542 | 9 | 2.0% |
| 7.225 | 7 | 1.6% |
| 7.925 | 7 | 1.6% |
| 7.25 | 7 | 1.6% |
| Other values (166) | 311 |
| Value | Count | Frequency (%) |
| 7.8958 | 24 | 5.4% |
| 8.05 | 23 | 5.2% |
| 26 | 19 | 4.3% |
| 10.5 | 14 | 3.1% |
| 7.75 | 14 | 3.1% |
| 13 | 14 | 3.1% |
| 7.925 | 11 | 2.5% |
| 0 | 10 | 2.2% |
| 7.225 | 9 | 2.0% |
| 7.8542 | 8 | 1.8% |
| Other values (168) | 300 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 1 | 0.2% |
Cabin
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 98 | 81 |
| Distinct (%) | 86.7% | 80.2% |
| Missing | 333 | 345 |
| Missing (%) | 74.7% | 77.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
| C23 C25 C27 | 4 |
|---|---|
| G6 | 3 |
| F33 | 3 |
| C78 | 2 |
| C124 | 2 |
| Other values (93) |
| F33 | 3 |
|---|---|
| E25 | 2 |
| E33 | 2 |
| E67 | 2 |
| C123 | 2 |
| Other values (76) |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.4955752 | 3.6633663 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 395 | 370 |
| Distinct characters | 19 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 87 | 62 ? |
| Unique (%) | 77.0% | 61.4% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C126 | G6 |
| 2nd row | D15 | A36 |
| 3rd row | C125 | B18 |
| 4th row | C23 C25 C27 | C68 |
| 5th row | E49 | C91 |
Common Values
| Value | Count | Frequency (%) |
| C23 C25 C27 | 4 | 0.9% |
| G6 | 3 | 0.7% |
| F33 | 3 | 0.7% |
| C78 | 2 | 0.4% |
| C124 | 2 | 0.4% |
| C65 | 2 | 0.4% |
| D | 2 | 0.4% |
| E121 | 2 | 0.4% |
| C83 | 2 | 0.4% |
| C126 | 2 | 0.4% |
| Other values (88) | 89 | 20.0% |
| (Missing) | 333 |
| Value | Count | Frequency (%) |
| F33 | 3 | 0.7% |
| E25 | 2 | 0.4% |
| E33 | 2 | 0.4% |
| E67 | 2 | 0.4% |
| C123 | 2 | 0.4% |
| C22 C26 | 2 | 0.4% |
| G6 | 2 | 0.4% |
| B58 B60 | 2 | 0.4% |
| D20 | 2 | 0.4% |
| B96 B98 | 2 | 0.4% |
| Other values (71) | 80 | 17.9% |
| (Missing) | 345 |
Length
Common Values (Plot)
Dataset A
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Dataset B
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| c23 | 4 | 3.1% |
| c27 | 4 | 3.1% |
| c25 | 4 | 3.1% |
| g6 | 3 | 2.3% |
| f33 | 3 | 2.3% |
| c78 | 2 | 1.6% |
| c124 | 2 | 1.6% |
| c65 | 2 | 1.6% |
| d | 2 | 1.6% |
| e121 | 2 | 1.6% |
| Other values (98) | 101 |
| Value | Count | Frequency (%) |
| f33 | 3 | 2.5% |
| b20 | 2 | 1.7% |
| e25 | 2 | 1.7% |
| c92 | 2 | 1.7% |
| b66 | 2 | 1.7% |
| b63 | 2 | 1.7% |
| b59 | 2 | 1.7% |
| c93 | 2 | 1.7% |
| b18 | 2 | 1.7% |
| c68 | 2 | 1.7% |
| Other values (83) | 98 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 44 | |
| 2 | 43 | |
| 1 | 33 | 8.4% |
| 3 | 32 | 8.1% |
| B | 26 | 6.6% |
| 5 | 25 | 6.3% |
| 6 | 24 | 6.1% |
| 7 | 23 | 5.8% |
| 4 | 22 | 5.6% |
| D | 21 | 5.3% |
| Other values (9) | 102 |
| Value | Count | Frequency (%) |
| 2 | 39 | |
| B | 38 | 10.3% |
| C | 32 | 8.6% |
| 3 | 31 | 8.4% |
| 6 | 28 | 7.6% |
| 1 | 24 | 6.5% |
| 4 | 21 | 5.7% |
| 5 | 20 | 5.4% |
| 7 | 20 | 5.4% |
| 8 | 20 | 5.4% |
| Other values (8) | 97 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 250 | |
| Uppercase Letter | 129 | |
| Space Separator | 16 | 4.1% |
| Value | Count | Frequency (%) |
| Decimal Number | 233 | |
| Uppercase Letter | 119 | |
| Space Separator | 18 | 4.9% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 44 | |
| B | 26 | |
| D | 21 | |
| E | 18 | |
| A | 9 | 7.0% |
| F | 6 | 4.7% |
| G | 4 | 3.1% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| B | 38 | |
| C | 32 | |
| D | 20 | |
| E | 16 | |
| F | 6 | 5.0% |
| A | 4 | 3.4% |
| G | 3 | 2.5% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 43 | |
| 1 | 33 | |
| 3 | 32 | |
| 5 | 25 | |
| 6 | 24 | |
| 7 | 23 | |
| 4 | 22 | |
| 8 | 20 | |
| 0 | 18 | |
| 9 | 10 | 4.0% |
| Value | Count | Frequency (%) |
| 2 | 39 | |
| 3 | 31 | |
| 6 | 28 | |
| 1 | 24 | |
| 4 | 21 | |
| 5 | 20 | |
| 7 | 20 | |
| 8 | 20 | |
| 9 | 16 | |
| 0 | 14 | 6.0% |
Space Separator
| Value | Count | Frequency (%) |
| 16 |
| Value | Count | Frequency (%) |
| 18 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 266 | |
| Latin | 129 |
| Value | Count | Frequency (%) |
| Common | 251 | |
| Latin | 119 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 44 | |
| B | 26 | |
| D | 21 | |
| E | 18 | |
| A | 9 | 7.0% |
| F | 6 | 4.7% |
| G | 4 | 3.1% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| B | 38 | |
| C | 32 | |
| D | 20 | |
| E | 16 | |
| F | 6 | 5.0% |
| A | 4 | 3.4% |
| G | 3 | 2.5% |
Common
| Value | Count | Frequency (%) |
| 2 | 43 | |
| 1 | 33 | |
| 3 | 32 | |
| 5 | 25 | |
| 6 | 24 | |
| 7 | 23 | |
| 4 | 22 | |
| 8 | 20 | |
| 0 | 18 | |
| 16 | 6.0% |
| Value | Count | Frequency (%) |
| 2 | 39 | |
| 3 | 31 | |
| 6 | 28 | |
| 1 | 24 | |
| 4 | 21 | |
| 5 | 20 | |
| 7 | 20 | |
| 8 | 20 | |
| 18 | ||
| 9 | 16 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 395 |
| Value | Count | Frequency (%) |
| ASCII | 370 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 44 | |
| 2 | 43 | |
| 1 | 33 | 8.4% |
| 3 | 32 | 8.1% |
| B | 26 | 6.6% |
| 5 | 25 | 6.3% |
| 6 | 24 | 6.1% |
| 7 | 23 | 5.8% |
| 4 | 22 | 5.6% |
| D | 21 | 5.3% |
| Other values (9) | 102 |
| Value | Count | Frequency (%) |
| 2 | 39 | |
| B | 38 | 10.3% |
| C | 32 | 8.6% |
| 3 | 31 | 8.4% |
| 6 | 28 | 7.6% |
| 1 | 24 | 6.5% |
| 4 | 21 | 5.7% |
| 5 | 20 | 5.4% |
| 7 | 20 | 5.4% |
| 8 | 20 | 5.4% |
| Other values (8) | 97 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 445 | 445 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.6% |
| Q | 41 | 9.2% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 76 | 17.0% |
| Q | 35 | 7.8% |
| (Missing) | 1 | 0.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 321 | |
| c | 83 | 18.7% |
| q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| s | 334 | |
| c | 76 | 17.1% |
| q | 35 | 7.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 76 | 17.1% |
| Q | 35 | 7.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 76 | 17.1% |
| Q | 35 | 7.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 445 |
| Value | Count | Frequency (%) |
| Latin | 445 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 76 | 17.1% |
| Q | 35 | 7.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 445 |
| Value | Count | Frequency (%) |
| ASCII | 445 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 321 | |
| C | 83 | 18.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 76 | 17.1% |
| Q | 35 | 7.9% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.044 | -0.012 | 0.007 | -0.035 | 0.106 | 0.000 | 0.000 | 0.089 | 0.038 |
| Age | 0.044 | 1.000 | -0.184 | -0.234 | 0.165 | 0.175 | 0.295 | 0.106 | 0.223 | 0.140 |
| SibSp | -0.012 | -0.184 | 1.000 | 0.437 | 0.442 | 0.134 | 0.125 | 0.183 | 0.000 | 0.066 |
| Parch | 0.007 | -0.234 | 0.437 | 1.000 | 0.416 | 0.121 | 0.000 | 0.268 | 0.000 | 0.080 |
| Fare | -0.035 | 0.165 | 0.442 | 0.416 | 1.000 | 0.258 | 0.532 | 0.234 | 0.380 | 0.230 |
| Survived | 0.106 | 0.175 | 0.134 | 0.121 | 0.258 | 1.000 | 0.324 | 0.537 | 0.000 | 0.122 |
| Pclass | 0.000 | 0.295 | 0.125 | 0.000 | 0.532 | 0.324 | 1.000 | 0.109 | 0.369 | 0.244 |
| Sex | 0.000 | 0.106 | 0.183 | 0.268 | 0.234 | 0.537 | 0.109 | 1.000 | 0.000 | 0.077 |
| Cabin | 0.089 | 0.223 | 0.000 | 0.000 | 0.380 | 0.000 | 0.369 | 0.000 | 1.000 | 0.356 |
| Embarked | 0.038 | 0.140 | 0.066 | 0.080 | 0.230 | 0.122 | 0.244 | 0.077 | 0.356 | 1.000 |
Dataset B
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.020 | -0.055 | 0.019 | -0.040 | 0.098 | 0.018 | 0.092 | 0.000 | 0.000 |
| Age | 0.020 | 1.000 | -0.202 | -0.280 | 0.120 | 0.129 | 0.292 | 0.037 | 0.269 | 0.000 |
| SibSp | -0.055 | -0.202 | 1.000 | 0.425 | 0.456 | 0.178 | 0.125 | 0.178 | 0.446 | 0.000 |
| Parch | 0.019 | -0.280 | 0.425 | 1.000 | 0.471 | 0.216 | 0.072 | 0.300 | 0.419 | 0.022 |
| Fare | -0.040 | 0.120 | 0.456 | 0.471 | 1.000 | 0.271 | 0.534 | 0.187 | 0.459 | 0.232 |
| Survived | 0.098 | 0.129 | 0.178 | 0.216 | 0.271 | 1.000 | 0.316 | 0.514 | 0.297 | 0.122 |
| Pclass | 0.018 | 0.292 | 0.125 | 0.072 | 0.534 | 0.316 | 1.000 | 0.131 | 0.452 | 0.230 |
| Sex | 0.092 | 0.037 | 0.178 | 0.300 | 0.187 | 0.514 | 0.131 | 1.000 | 0.000 | 0.000 |
| Cabin | 0.000 | 0.269 | 0.446 | 0.419 | 0.459 | 0.297 | 0.452 | 0.000 | 1.000 | 0.454 |
| Embarked | 0.000 | 0.000 | 0.000 | 0.022 | 0.232 | 0.122 | 0.230 | 0.000 | 0.454 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 427 | 428 | 1 | 2 | Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") | female | 19.0 | 0 | 0 | 250655 | 26.0000 | NaN | S |
| 864 | 865 | 0 | 2 | Gill, Mr. John William | male | 24.0 | 0 | 0 | 233866 | 13.0000 | NaN | S |
| 403 | 404 | 0 | 3 | Hakkarainen, Mr. Pekka Pietari | male | 28.0 | 1 | 0 | STON/O2. 3101279 | 15.8500 | NaN | S |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 160 | 161 | 0 | 3 | Cribb, Mr. John Hatfield | male | 44.0 | 0 | 1 | 371362 | 16.1000 | NaN | S |
| 811 | 812 | 0 | 3 | Lester, Mr. James | male | 39.0 | 0 | 0 | A/4 48871 | 24.1500 | NaN | S |
| 733 | 734 | 0 | 2 | Berriman, Mr. William John | male | 23.0 | 0 | 0 | 28425 | 13.0000 | NaN | S |
| 421 | 422 | 0 | 3 | Charters, Mr. David | male | 21.0 | 0 | 0 | A/5. 13032 | 7.7333 | NaN | Q |
| 221 | 222 | 0 | 2 | Bracken, Mr. James H | male | 27.0 | 0 | 0 | 220367 | 13.0000 | NaN | S |
| 131 | 132 | 0 | 3 | Coelho, Mr. Domingos Fernandeo | male | 20.0 | 0 | 0 | SOTON/O.Q. 3101307 | 7.0500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 675 | 676 | 0 | 3 | Edvardsson, Mr. Gustaf Hjalmar | male | 18.0 | 0 | 0 | 349912 | 7.7750 | NaN | S |
| 376 | 377 | 1 | 3 | Landergren, Miss. Aurora Adelia | female | 22.0 | 0 | 0 | C 7077 | 7.2500 | NaN | S |
| 251 | 252 | 0 | 3 | Strom, Mrs. Wilhelm (Elna Matilda Persson) | female | 29.0 | 1 | 1 | 347054 | 10.4625 | G6 | S |
| 219 | 220 | 0 | 2 | Harris, Mr. Walter | male | 30.0 | 0 | 0 | W/C 14208 | 10.5000 | NaN | S |
| 806 | 807 | 0 | 1 | Andrews, Mr. Thomas Jr | male | 39.0 | 0 | 0 | 112050 | 0.0000 | A36 | S |
| 575 | 576 | 0 | 3 | Patchett, Mr. George | male | 19.0 | 0 | 0 | 358585 | 14.5000 | NaN | S |
| 545 | 546 | 0 | 1 | Nicholson, Mr. Arthur Ernest | male | 64.0 | 0 | 0 | 693 | 26.0000 | NaN | S |
| 173 | 174 | 0 | 3 | Sivola, Mr. Antti Wilhelm | male | 21.0 | 0 | 0 | STON/O 2. 3101280 | 7.9250 | NaN | S |
| 750 | 751 | 1 | 2 | Wells, Miss. Joan | female | 4.0 | 1 | 1 | 29103 | 23.0000 | NaN | S |
| 354 | 355 | 0 | 3 | Yousif, Mr. Wazli | male | NaN | 0 | 0 | 2647 | 7.2250 | NaN | C |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 66 | 67 | 1 | 2 | Nye, Mrs. (Elizabeth Ramell) | female | 29.00 | 0 | 0 | C.A. 29395 | 10.500 | F33 | S |
| 340 | 341 | 1 | 2 | Navratil, Master. Edmond Roger | male | 2.00 | 1 | 1 | 230080 | 26.000 | F2 | S |
| 491 | 492 | 0 | 3 | Windelov, Mr. Einar | male | 21.00 | 0 | 0 | SOTON/OQ 3101317 | 7.250 | NaN | S |
| 406 | 407 | 0 | 3 | Widegren, Mr. Carl/Charles Peter | male | 51.00 | 0 | 0 | 347064 | 7.750 | NaN | S |
| 37 | 38 | 0 | 3 | Cann, Mr. Ernest Charles | male | 21.00 | 0 | 0 | A./5. 2152 | 8.050 | NaN | S |
| 577 | 578 | 1 | 1 | Silvey, Mrs. William Baird (Alice Munger) | female | 39.00 | 1 | 0 | 13507 | 55.900 | E44 | S |
| 837 | 838 | 0 | 3 | Sirota, Mr. Maurice | male | NaN | 0 | 0 | 392092 | 8.050 | NaN | S |
| 320 | 321 | 0 | 3 | Dennis, Mr. Samuel | male | 22.00 | 0 | 0 | A/5 21172 | 7.250 | NaN | S |
| 755 | 756 | 1 | 2 | Hamalainen, Master. Viljo | male | 0.67 | 1 | 1 | 250649 | 14.500 | NaN | S |
| 400 | 401 | 1 | 3 | Niskanen, Mr. Juha | male | 39.00 | 0 | 0 | STON/O 2. 3101289 | 7.925 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 569 | 570 | 1 | 3 | Jonsson, Mr. Carl | male | 32.0 | 0 | 0 | 350417 | 7.8542 | NaN | S |
| 764 | 765 | 0 | 3 | Eklund, Mr. Hans Linus | male | 16.0 | 0 | 0 | 347074 | 7.7750 | NaN | S |
| 343 | 344 | 0 | 2 | Sedgwick, Mr. Charles Frederick Waddington | male | 25.0 | 0 | 0 | 244361 | 13.0000 | NaN | S |
| 781 | 782 | 1 | 1 | Dick, Mrs. Albert Adrian (Vera Gillespie) | female | 17.0 | 1 | 0 | 17474 | 57.0000 | B20 | S |
| 579 | 580 | 1 | 3 | Jussila, Mr. Eiriik | male | 32.0 | 0 | 0 | STON/O 2. 3101286 | 7.9250 | NaN | S |
| 738 | 739 | 0 | 3 | Ivanoff, Mr. Kanio | male | NaN | 0 | 0 | 349201 | 7.8958 | NaN | S |
| 813 | 814 | 0 | 3 | Andersson, Miss. Ebba Iris Alfrida | female | 6.0 | 4 | 2 | 347082 | 31.2750 | NaN | S |
| 614 | 615 | 0 | 3 | Brocklebank, Mr. William Alfred | male | 35.0 | 0 | 0 | 364512 | 8.0500 | NaN | S |
| 684 | 685 | 0 | 2 | Brown, Mr. Thomas William Solomon | male | 60.0 | 1 | 1 | 29750 | 39.0000 | NaN | S |
| 449 | 450 | 1 | 1 | Peuchen, Major. Arthur Godfrey | male | 52.0 | 0 | 0 | 113786 | 30.5000 | C104 | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||